Universidad de Sonora

Ciencia de datos en la Dark web

Jonnathan Axel Uribe Enriquez

Introducción

Uno de los principales problemas de realizar compras en la Dark web (el #1 ) es el riesgo de ir a la cárcel.

El lado tedioso de las compras en el mercado negro es mantener el anonimato ya que es algo imprescindible en el mundo virtual para hacer este tipo de transacciones, tanto por el lado de los compradores como por el de los vendedores. Pero una vez dentro ¿Qué es lo podemos observar?.

In [3148]:
Image (filename="img/dream.jpg", width=1000, height=1000)
Out[3148]:

Del sitio web al dataset

In [3149]:
Image (filename="img/tabla.jpg", width=1000, height=1000)
Out[3149]:

Información del dataset

In [3153]:
print (perico.columns) 
print ("Total de columnas:",len (perico.columns))
print ("Total de renglones:", len(perico))
Index(['Unnamed: 0', 'product_title', 'ships_from_to', 'grams', 'quality',
       'btc_price', 'cost_per_gram', 'cost_per_gram_pure', 'escrow',
       'product_link', 'vendor_link', 'vendor_name', 'successful_transactions',
       'rating', 'ships_from', 'ships_to', 'ships_to_US', 'ships_from_US',
       'ships_to_NL', 'ships_from_NL', 'ships_to_FR', 'ships_from_FR',
       'ships_to_GB', 'ships_from_GB', 'ships_to_CA', 'ships_from_CA',
       'ships_to_DE', 'ships_from_DE', 'ships_to_AU', 'ships_from_AU',
       'ships_to_EU', 'ships_from_EU', 'ships_to_ES', 'ships_from_ES',
       'ships_to_N. America', 'ships_from_N. America', 'ships_to_BE',
       'ships_from_BE', 'ships_to_WW', 'ships_from_WW', 'ships_to_SI',
       'ships_from_SI', 'ships_to_IT', 'ships_from_IT', 'ships_to_DK',
       'ships_from_DK', 'ships_to_S. America', 'ships_from_S. America',
       'ships_to_CH', 'ships_from_CH', 'ships_to_BR', 'ships_from_BR',
       'ships_to_CZ', 'ships_from_CZ', 'ships_to_SE', 'ships_from_SE',
       'ships_to_CO', 'ships_from_CO', 'ships_to_CN', 'ships_from_CN',
       'ships_to_PL', 'ships_from_PL', 'ships_to_GR', 'ships_from_GR'],
      dtype='object')
Total de columnas: 64
Total de renglones: 1504

El Conjunto de datos está compuesto por 64 columnas y 1504 renglones. Se pueden ver algunas variables con nombres que describen vagamente el contenido de sus columnas pero es necesario desglosarlas para saber que técnicas y tratamientos se pueden aplicar.

Descripción de las variables

'product_title'

Contiene una breve descripción del producto que el usuario vende

'ships_from_to'

De donde Proviene y a donde se envía

'grams'

Cantidad de gramos que un vendedor envía Algunos vendedores se anuncian como "0.5G COCAINE 89% " y otros "10 Gram 87% Pure Uncut Colombian Cocaine"

'quality'

La calidad de la cocaina que venden, como ejemplo, la denominada "Yen” es más pura (roza el 98%) y más blanca, escamosa y brillante. Ahora bien, existen otros tipos de cocaína en polvo con menos pureza, en la que su presentación es más polvorienta y con menor brillo. Esto ocurre porque para su comercialización se añaden una serie de sustancias químicas que pueden ser altamente tóxicas. Cuantas menos sustancias incorpore más pura es la coca y es considerada de mayor calidad.

'btc_price'

Precio en bitcoins del producto que venden

'cost_per_gram'

Costo por gramo de coca no pura

'cost_per_gram_pure'

Costo por gramo de coca pura

'product_link'

Enlace al producto

'vendor_link'

Enlace al vendedor

'vendor_name'

Nombre del vendedor

'successful_transactions'

Número de transacciones exitosas

'rating'

Calificación del vendedor

'ships'

El resto de las variables que aparecen en el dataset estan relacionadas con información geográfica que indica específicamente desde donde se esta enviando el producto y hacia a donde se envía, parecen redundantes por la forma en la fue creado el dataset pero aun así se puede trabajar con ellas

Análisis de datos

Una vez analizado el conjunto de datos se han detectado 5 rasgos que pueden resultar sumamente importantes para comprender y determinar el comportamiento de los usuarios en el sitio de la Dark web.

  • Principales involucrados. ¿Qué países son los más afectados?
  • Factores que influyen en el precio del producto
  • Calidad
  • Cantidad
  • Vendedores

Principales involucrados ¿Qué países son los más afectados?

In [3160]:
Image (filename="img/mapa_calor.jpg", width=1000, height=1000)
Out[3160]:
In [3162]:
sns.set(font_scale=2)
plt.figure(figsize=(30,15))
plt.margins(0.8)
plt.title('Cantidad de envíos por país')
sns.barplot (x=df_value_counts['ships_from'], y=df_value_counts['cantidad_envios']) 
Out[3162]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f327821860>

¿Por qué son ellos los principales vendedores?

  • No son productores como Colombia o México

¿De donde sale toda la droga que venden?

¿Por qué no aparecen los principales productores?

  • Colombia y México no se ven afectados por la venta de droga por internet.

Comportamiento según United Nations Office on Drugs and Crime (UNODC). tomado del año 2013- 2017 https://wdr.unodc.org/wdr2019/prelaunch/WDR19_Booklet_1_EXECUTIVE_SUMMARY.pdf

In [3163]:
Image (filename="img/envios_coc.png", width=1000, height=1000)
Out[3163]:

Calidad y cantidad

In [3165]:
print ("Calidad promedio", calidad_promedio)
print ("Calidad moda", calidad_moda)
Calidad promedio 88.52646276595746
Calidad moda 0    90.0
dtype: float64
In [3166]:
plt.figure(figsize=(15,7))
plt.title('Distribución de variable quality')
sns.distplot (perico['quality'])
Out[3166]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f315b0f828>
In [3167]:
plt.figure(figsize=(15, 5))
plt.title('diagrama de variable quality')
perico.boxplot(column=['quality'], grid = True,vert=False)
Out[3167]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f315b0ff28>
In [3168]:
cantidad_promedio =perico['grams'].mean()
cantidad_moda =perico['grams'].mode()
print ("Cantidad promedio", cantidad_promedio)
print ("Cantidad moda", cantidad_moda)
Cantidad promedio 59.06216090425532
Cantidad moda 0    1.0
dtype: float64
In [3169]:
plt.figure(figsize=(15,7))
plt.title('Distribución de variable grams')
sns.distplot (perico['grams'])
Out[3169]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f30dae41d0>
In [3170]:
plt.figure(figsize=(16, 6))
plt.title('diagrama de variable grams')
perico.boxplot(column=['grams'], grid = True,vert=False)
Out[3170]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f300b517f0>

¿Cuánto cuesta?, ¿Dónde es mas costosa?, ¿Qué factores influyen en el precio?

In [3172]:
#sns.set(font_scale=2)
plt.figure(figsize=(30,15))
plt.margins(0.8)
plt.title('Costo por gramo')
sns.barplot (x=data2['ships_from'], y=data2['cost_per_gram']) 
Out[3172]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f37381db00>
In [3173]:
sns.set(font_scale=2)

f, ax = plt.subplots(figsize = (30,15))
sns.set_color_codes('pastel')
sns.barplot(x = 'ships_from', y = 'cost_per_gram_pure', data = data2,
            label = 'cost_per_gram_pure', color = 'b', edgecolor = 'w')
sns.set_color_codes('muted')
sns.barplot(x = 'ships_from', y = 'cost_per_gram', data = data2,
            label = 'cost_per_gram', color = 'b', edgecolor = 'w')
ax.legend(ncol = 2, loc = 'upper right')
plt.show()
In [3179]:
Image (filename="img/calor2.png", width=1000, height=1000)
Out[3179]:
In [3180]:
Image (filename="img/australia.jpg", width=700, height=700)
Out[3180]:

“Getting anything through customs is really hard. They’ve got really strict border control”

Principal causa de precio elevado.

  • En los paises productores como colombia el precio de la cocaina por gramo es de aproximadamente 1.5 Euros -> 32.34 Pesos.

  • Para ser exportada tiene que irse a venezuela, solo por cruzar la frontera el precio en venezuela sube a 6 Euros -> 125.38 Pesos.

  • Una vez que cruza el atlantico, en Barcelona (España) el precio sube a 40 Euros -> 835.92 Pesos.

  • Finalmente fuera de ella sube a 80 euros -> 1721.29 pesos

Machine learning

Aprendizaje supervisado

El aprendizaje supervisado ​​son un conjunto de técnicas que permite realizar predicciones futuras basadas en comportamientos o características analizadas en datos históricos etiquetados.

Aprendizaje no supervisado

Aprendizaje no supervisado es un método de Aprendizaje Automático donde un modelo se ajusta a las observaciones. Se distingue del Aprendizaje supervisado por el hecho de que no hay un conocimiento a priori.

In [3181]:
sns.set(font_scale=1)
g = sns.PairGrid(perico, vars=["cost_per_gram", "quality","cost_per_gram_pure", "rating","grams","successful_transactions","btc_price"])
g = g.map_diag(plt.hist, edgecolor="w")
g = g.map_offdiag(plt.scatter, edgecolor="w", s=80)
In [3186]:
g=sns.pairplot(datoscoke, vars=["cost_per_gram", "cost_per_gram_pure"])
g.fig.set_size_inches(14,7)

Modelo de regresión

Regresión lineal simple para predecir precio de la cocaina

In [3187]:
lm = smf.ols ( formula = "cost_per_gram ~ cost_per_gram_pure", data = datoscoke).fit()
In [3190]:
lm.summary()
Out[3190]:
OLS Regression Results
Dep. Variable: cost_per_gram R-squared: 0.972
Model: OLS Adj. R-squared: 0.972
Method: Least Squares F-statistic: 5.170e+04
Date: Thu, 05 Mar 2020 Prob (F-statistic): 0.00
Time: 11:01:59 Log-Likelihood: 6763.9
No. Observations: 1504 AIC: -1.352e+04
Df Residuals: 1502 BIC: -1.351e+04
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Intercept 0.0007 0.000 4.929 0.000 0.000 0.001
cost_per_gram_pure 0.8621 0.004 227.384 0.000 0.855 0.870
Omnibus: 1051.362 Durbin-Watson: 1.618
Prob(Omnibus): 0.000 Jarque-Bera (JB): 29489.955
Skew: -2.876 Prob(JB): 0.00
Kurtosis: 23.916 Cond. No. 54.6


Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
In [3192]:
datoscoke.plot(kind="scatter",x="cost_per_gram_pure", y="cost_per_gram")
plt.plot(pd.DataFrame(datoscoke["cost_per_gram_pure"]),coca_pred, c="red", linewidth=2)
Out[3192]:
[<matplotlib.lines.Line2D at 0x1f32843c518>]

Modelo de regresión múltiple para predecir el costo por gramo

  • Costo por gramo puro
  • Rating
  • Calidad
  • Cantidad de gramos
In [3196]:
lm3.rsquared
Out[3196]:
0.9933983538512833
In [3205]:
lm3.summary()
Out[3205]:
OLS Regression Results
Dep. Variable: cost_per_gram R-squared: 0.993
Model: OLS Adj. R-squared: 0.993
Method: Least Squares F-statistic: 4.508e+04
Date: Thu, 05 Mar 2020 Prob (F-statistic): 0.00
Time: 11:02:00 Log-Likelihood: 7856.6
No. Observations: 1504 AIC: -1.570e+04
Df Residuals: 1498 BIC: -1.567e+04
Df Model: 5
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Intercept -0.0369 0.001 -24.725 0.000 -0.040 -0.034
grams -3.997e-07 1.43e-07 -2.786 0.005 -6.81e-07 -1.18e-07
ships_from -3.048e-05 1.49e-05 -2.047 0.041 -5.97e-05 -1.27e-06
quality 0.0004 5.59e-06 69.612 0.000 0.000 0.000
cost_per_gram_pure 0.8700 0.002 449.376 0.000 0.866 0.874
rating 0.0006 0.000 2.194 0.028 6.59e-05 0.001
Omnibus: 1625.834 Durbin-Watson: 1.803
Prob(Omnibus): 0.000 Jarque-Bera (JB): 336292.680
Skew: -4.868 Prob(JB): 0.00
Kurtosis: 75.606 Cond. No. 1.42e+04


Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.42e+04. This might indicate that there are
strong multicollinearity or other numerical problems.
In [3209]:
Xs=datoscoke[feature_colssimple]
Ys=datoscoke["cost_per_gram"]
In [3211]:
lmsimple = LinearRegression()
lmsimple.fit(Xs,Ys)
Out[3211]:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
         normalize=False)
In [3213]:
print (lmsimple.intercept_)
print (lmsimple.coef_)
0.000740503332923817
[0.86211033]
In [3215]:
list (zip(feature_colssimple, lmsimple.coef_))
Out[3215]:
[('cost_per_gram_pure', 0.8621103274740817)]
In [3217]:
lmsimple.score(Xs,Ys)#rcuadrado
Out[3217]:
0.9717698330393927

Comparación de modelos lineales

In [3220]:
datoscoke[['cost_per_gram','R_simple','R_mult']].head(15)
Out[3220]:
cost_per_gram R_simple R_mult
0 0.025770 0.025426 0.025884
1 0.025750 0.025406 0.025864
2 0.032980 0.032687 0.033016
3 0.041200 0.040649 0.041058
4 0.034000 0.034432 0.033899
5 0.027050 0.027545 0.026945
6 0.031150 0.031608 0.031049
7 0.029667 0.030138 0.029565
8 0.028340 0.028823 0.028237
9 0.023460 0.027707 0.022475
10 0.019370 0.023006 0.017731
11 0.014643 0.017572 0.012242
12 0.032280 0.030034 0.032607
13 0.017600 0.020971 0.015677
14 0.013207 0.015922 0.010571

Validación del modelo

In [3222]:
check = (a<0.8)
training =datoscoke[check]#conjunto de entrenamiento 80%
testing =datoscoke[~check]#conjunto de testing 20%
In [3229]:
lm5.summary()#datos de entrenamiento
Out[3229]:
OLS Regression Results
Dep. Variable: cost_per_gram R-squared: 0.994
Model: OLS Adj. R-squared: 0.994
Method: Least Squares F-statistic: 3.822e+04
Date: Thu, 05 Mar 2020 Prob (F-statistic): 0.00
Time: 11:02:01 Log-Likelihood: 6373.6
No. Observations: 1203 AIC: -1.274e+04
Df Residuals: 1197 BIC: -1.270e+04
Df Model: 5
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Intercept -0.0335 0.001 -63.425 0.000 -0.035 -0.033
grams 1.48e-06 9.39e-07 1.575 0.115 -3.63e-07 3.32e-06
ships_from -2.161e-05 1.54e-05 -1.399 0.162 -5.19e-05 8.7e-06
btc_price -0.0001 5.51e-05 -1.934 0.053 -0.000 1.55e-06
quality 0.0004 5.89e-06 65.254 0.000 0.000 0.000
cost_per_gram_pure 0.8728 0.002 415.385 0.000 0.869 0.877
Omnibus: 1306.282 Durbin-Watson: 2.009
Prob(Omnibus): 0.000 Jarque-Bera (JB): 278728.158
Skew: -4.827 Prob(JB): 0.00
Kurtosis: 76.942 Cond. No. 1.30e+04


Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.3e+04. This might indicate that there are
strong multicollinearity or other numerical problems.
In [3230]:
lm5test= smf.ols(formula ="cost_per_gram ~ grams+ships_from+btc_price+quality+cost_per_gram_pure", data=test).fit()
#datos de prueba
In [3231]:
lm5test.summary()
Out[3231]:
OLS Regression Results
Dep. Variable: cost_per_gram R-squared: 0.992
Model: OLS Adj. R-squared: 0.992
Method: Least Squares F-statistic: 7760.
Date: Thu, 05 Mar 2020 Prob (F-statistic): 1.23e-310
Time: 11:02:01 Log-Likelihood: 1508.1
No. Observations: 301 AIC: -3004.
Df Residuals: 295 BIC: -2982.
Df Model: 5
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Intercept -0.0349 0.001 -26.623 0.000 -0.037 -0.032
grams -1.128e-06 8.73e-07 -1.292 0.197 -2.85e-06 5.9e-07
ships_from -3.766e-05 4.05e-05 -0.929 0.354 -0.000 4.21e-05
btc_price 5.142e-05 7.03e-05 0.732 0.465 -8.69e-05 0.000
quality 0.0004 1.47e-05 27.443 0.000 0.000 0.000
cost_per_gram_pure 0.8651 0.005 184.786 0.000 0.856 0.874
Omnibus: 328.803 Durbin-Watson: 1.971
Prob(Omnibus): 0.000 Jarque-Bera (JB): 37906.414
Skew: -4.225 Prob(JB): 0.00
Kurtosis: 57.323 Cond. No. 1.71e+04


Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.71e+04. This might indicate that there are
strong multicollinearity or other numerical problems.

Predicción lineal multiple sin variable dominante.

In [3232]:
feature_cols2 = ["grams","ships_from","quality","rating","btc_price","successful_transactions"]
from sklearn.linear_model import LinearRegression 
In [3237]:
lmsin.score(X2,Y2)#rcuadrado
Out[3237]:
0.1081128838542389
In [3240]:
datoscoke[['cost_per_gram','R_simple','R_mult','R_SinV']].head(15)
Out[3240]:
cost_per_gram R_simple R_mult R_SinV
0 0.025770 0.025426 0.025884 0.024862
1 0.025750 0.025406 0.025864 0.024855
2 0.032980 0.032687 0.033016 0.029230
3 0.041200 0.040649 0.041058 0.030291
4 0.034000 0.034432 0.033899 0.025938
5 0.027050 0.027545 0.026945 0.025873
6 0.031150 0.031608 0.031049 0.025932
7 0.029667 0.030138 0.029565 0.025925
8 0.028340 0.028823 0.028237 0.025910
9 0.023460 0.027707 0.022475 0.029381
10 0.019370 0.023006 0.017731 0.029375
11 0.014643 0.017572 0.012242 0.029219
12 0.032280 0.030034 0.032607 0.032758
13 0.017600 0.020971 0.015677 0.029364
14 0.013207 0.015922 0.010571 0.029036

Árbol de regresión

In [3249]:
datoscoke[['cost_per_gram','R_simple','R_mult','R_SinV','Arbol']].head(15)
Out[3249]:
cost_per_gram R_simple R_mult R_SinV Arbol
0 0.025770 0.025426 0.025884 0.024862 0.025672
1 0.025750 0.025406 0.025864 0.024855 0.025672
2 0.032980 0.032687 0.033016 0.029230 0.033109
3 0.041200 0.040649 0.041058 0.030291 0.041413
4 0.034000 0.034432 0.033899 0.025938 0.034863
5 0.027050 0.027545 0.026945 0.025873 0.026905
6 0.031150 0.031608 0.031049 0.025932 0.031465
7 0.029667 0.030138 0.029565 0.025925 0.030601
8 0.028340 0.028823 0.028237 0.025910 0.027617
9 0.023460 0.027707 0.022475 0.029381 0.022131
10 0.019370 0.023006 0.017731 0.029375 0.019339
11 0.014643 0.017572 0.012242 0.029219 0.016617
12 0.032280 0.030034 0.032607 0.032758 0.033233
13 0.017600 0.020971 0.015677 0.029364 0.015891
14 0.013207 0.015922 0.010571 0.029036 0.014825
In [3250]:
from sklearn.tree import export_graphviz
with open ("img/arbold_coca.dot","w") as dotfile:
    export_graphviz (regtree, out_file=dotfile, feature_names=predictors)
    dotfile.close()

import os

from graphviz import Source
file= open("img/arbold_coca.dot","r")
text = file.read()
Source(text)
Out[3250]:
Tree 0 cost_per_gram_pure <= 0.073 mse = 0.0 samples = 1504 value = 0.031 1 cost_per_gram_pure <= 0.037 mse = 0.0 samples = 1459 value = 0.029 0->1 True 138 cost_per_gram_pure <= 0.118 mse = 0.001 samples = 45 value = 0.102 0->138 False 2 cost_per_gram_pure <= 0.024 mse = 0.0 samples = 1053 value = 0.024 1->2 93 cost_per_gram_pure <= 0.049 mse = 0.0 samples = 406 value = 0.04 1->93 3 cost_per_gram_pure <= 0.013 mse = 0.0 samples = 223 value = 0.016 2->3 24 cost_per_gram_pure <= 0.03 mse = 0.0 samples = 830 value = 0.027 2->24 4 cost_per_gram_pure <= 0.006 mse = 0.0 samples = 40 value = 0.004 3->4 7 cost_per_gram_pure <= 0.02 mse = 0.0 samples = 183 value = 0.018 3->7 5 mse = 0.0 samples = 26 value = 0.003 4->5 6 mse = 0.0 samples = 14 value = 0.007 4->6 8 cost_per_gram_pure <= 0.018 mse = 0.0 samples = 55 value = 0.016 7->8 11 quality <= 78.5 mse = 0.0 samples = 128 value = 0.019 7->11 9 mse = 0.0 samples = 29 value = 0.015 8->9 10 mse = 0.0 samples = 26 value = 0.017 8->10 12 mse = 0.0 samples = 13 value = 0.016 11->12 13 cost_per_gram_pure <= 0.021 mse = 0.0 samples = 115 value = 0.02 11->13 14 cost_per_gram_pure <= 0.02 mse = 0.0 samples = 40 value = 0.019 13->14 17 cost_per_gram_pure <= 0.023 mse = 0.0 samples = 75 value = 0.02 13->17 15 mse = 0.0 samples = 18 value = 0.018 14->15 16 mse = 0.0 samples = 22 value = 0.019 14->16 18 quality <= 91.5 mse = 0.0 samples = 41 value = 0.02 17->18 21 quality <= 91.0 mse = 0.0 samples = 34 value = 0.021 17->21 19 mse = 0.0 samples = 22 value = 0.019 18->19 20 mse = 0.0 samples = 19 value = 0.02 18->20 22 mse = 0.0 samples = 23 value = 0.021 21->22 23 mse = 0.0 samples = 11 value = 0.021 21->23 25 cost_per_gram_pure <= 0.026 mse = 0.0 samples = 383 value = 0.024 24->25 58 quality <= 83.0 mse = 0.0 samples = 447 value = 0.029 24->58 26 quality <= 84.5 mse = 0.0 samples = 141 value = 0.022 25->26 39 quality <= 75.5 mse = 0.0 samples = 242 value = 0.025 25->39 27 mse = 0.0 samples = 16 value = 0.019 26->27 28 cost_per_gram_pure <= 0.025 mse = 0.0 samples = 125 value = 0.023 26->28 29 cost_per_gram_pure <= 0.025 mse = 0.0 samples = 56 value = 0.022 28->29 34 quality <= 90.5 mse = 0.0 samples = 69 value = 0.023 28->34 30 mse = 0.0 samples = 25 value = 0.022 29->30 31 quality <= 90.5 mse = 0.0 samples = 31 value = 0.023 29->31 32 mse = 0.0 samples = 21 value = 0.022 31->32 33 mse = 0.0 samples = 10 value = 0.023 31->33 35 quality <= 88.5 mse = 0.0 samples = 50 value = 0.023 34->35 38 mse = 0.0 samples = 19 value = 0.024 34->38 36 mse = 0.0 samples = 12 value = 0.023 35->36 37 mse = 0.0 samples = 38 value = 0.023 35->37 40 mse = 0.0 samples = 12 value = 0.02 39->40 41 cost_per_gram_pure <= 0.028 mse = 0.0 samples = 230 value = 0.025 39->41 42 quality <= 87.0 mse = 0.0 samples = 95 value = 0.025 41->42 49 quality <= 85.5 mse = 0.0 samples = 135 value = 0.026 41->49 43 mse = 0.0 samples = 11 value = 0.023 42->43 44 quality <= 91.5 mse = 0.0 samples = 84 value = 0.025 42->44 45 cost_per_gram_pure <= 0.027 mse = 0.0 samples = 67 value = 0.025 44->45 48 mse = 0.0 samples = 17 value = 0.026 44->48 46 mse = 0.0 samples = 21 value = 0.024 45->46 47 mse = 0.0 samples = 46 value = 0.025 45->47 50 mse = 0.0 samples = 13 value = 0.024 49->50 51 quality <= 90.5 mse = 0.0 samples = 122 value = 0.026 49->51 52 cost_per_gram_pure <= 0.029 mse = 0.0 samples = 106 value = 0.026 51->52 57 mse = 0.0 samples = 16 value = 0.027 51->57 53 mse = 0.0 samples = 74 value = 0.026 52->53 54 quality <= 89.0 mse = 0.0 samples = 32 value = 0.026 52->54 55 mse = 0.0 samples = 10 value = 0.026 54->55 56 mse = 0.0 samples = 22 value = 0.026 54->56 59 quality <= 77.0 mse = 0.0 samples = 51 value = 0.025 58->59 64 cost_per_gram_pure <= 0.033 mse = 0.0 samples = 396 value = 0.03 58->64 60 mse = 0.0 samples = 17 value = 0.022 59->60 61 cost_per_gram_pure <= 0.033 mse = 0.0 samples = 34 value = 0.026 59->61 62 mse = 0.0 samples = 21 value = 0.025 61->62 63 mse = 0.0 samples = 13 value = 0.028 61->63 65 cost_per_gram_pure <= 0.032 mse = 0.0 samples = 234 value = 0.029 64->65 80 quality <= 92.5 mse = 0.0 samples = 162 value = 0.031 64->80 66 quality <= 89.5 mse = 0.0 samples = 124 value = 0.028 65->66 73 quality <= 87.5 mse = 0.0 samples = 110 value = 0.029 65->73 67 mse = 0.0 samples = 27 value = 0.027 66->67 68 quality <= 90.5 mse = 0.0 samples = 97 value = 0.028 66->68 69 cost_per_gram_pure <= 0.031 mse = 0.0 samples = 71 value = 0.028 68->69 72 mse = 0.0 samples = 26 value = 0.029 68->72 70 mse = 0.0 samples = 25 value = 0.027 69->70 71 mse = 0.0 samples = 46 value = 0.028 69->71 74 mse = 0.0 samples = 14 value = 0.028 73->74 75 quality <= 93.0 mse = 0.0 samples = 96 value = 0.03 73->75 76 cost_per_gram_pure <= 0.032 mse = 0.0 samples = 85 value = 0.029 75->76 79 mse = 0.0 samples = 11 value = 0.031 75->79 77 mse = 0.0 samples = 24 value = 0.029 76->77 78 mse = 0.0 samples = 61 value = 0.03 76->78 81 cost_per_gram_pure <= 0.036 mse = 0.0 samples = 141 value = 0.031 80->81 92 mse = 0.0 samples = 21 value = 0.033 80->92 82 quality <= 86.5 mse = 0.0 samples = 103 value = 0.031 81->82 89 quality <= 88.5 mse = 0.0 samples = 38 value = 0.032 81->89 83 mse = 0.0 samples = 16 value = 0.03 82->83 84 cost_per_gram_pure <= 0.035 mse = 0.0 samples = 87 value = 0.031 82->84 85 cost_per_gram_pure <= 0.034 mse = 0.0 samples = 58 value = 0.031 84->85 88 mse = 0.0 samples = 29 value = 0.031 84->88 86 mse = 0.0 samples = 22 value = 0.031 85->86 87 mse = 0.0 samples = 36 value = 0.031 85->87 90 mse = 0.0 samples = 17 value = 0.031 89->90 91 mse = 0.0 samples = 21 value = 0.033 89->91 94 quality <= 80.5 mse = 0.0 samples = 308 value = 0.037 93->94 129 cost_per_gram_pure <= 0.059 mse = 0.0 samples = 98 value = 0.049 93->129 95 mse = 0.0 samples = 23 value = 0.028 94->95 96 cost_per_gram_pure <= 0.042 mse = 0.0 samples = 285 value = 0.038 94->96 97 quality <= 91.5 mse = 0.0 samples = 146 value = 0.035 96->97 112 cost_per_gram_pure <= 0.044 mse = 0.0 samples = 139 value = 0.04 96->112 98 cost_per_gram_pure <= 0.038 mse = 0.0 samples = 92 value = 0.035 97->98 107 cost_per_gram_pure <= 0.04 mse = 0.0 samples = 54 value = 0.037 97->107 99 mse = 0.0 samples = 28 value = 0.033 98->99 100 quality <= 86.5 mse = 0.0 samples = 64 value = 0.035 98->100 101 mse = 0.0 samples = 14 value = 0.034 100->101 102 cost_per_gram_pure <= 0.039 mse = 0.0 samples = 50 value = 0.036 100->102 103 mse = 0.0 samples = 17 value = 0.035 102->103 104 cost_per_gram_pure <= 0.04 mse = 0.0 samples = 33 value = 0.036 102->104 105 mse = 0.0 samples = 21 value = 0.036 104->105 106 mse = 0.0 samples = 12 value = 0.037 104->106 108 cost_per_gram_pure <= 0.038 mse = 0.0 samples = 35 value = 0.036 107->108 111 mse = 0.0 samples = 19 value = 0.038 107->111 109 mse = 0.0 samples = 11 value = 0.035 108->109 110 mse = 0.0 samples = 24 value = 0.036 108->110 113 quality <= 87.5 mse = 0.0 samples = 53 value = 0.038 112->113 118 quality <= 85.5 mse = 0.0 samples = 86 value = 0.041 112->118 114 mse = 0.0 samples = 12 value = 0.036 113->114 115 quality <= 91.0 mse = 0.0 samples = 41 value = 0.039 113->115 116 mse = 0.0 samples = 25 value = 0.039 115->116 117 mse = 0.0 samples = 16 value = 0.04 115->117 119 mse = 0.0 samples = 12 value = 0.039 118->119 120 quality <= 92.5 mse = 0.0 samples = 74 value = 0.042 118->120 121 cost_per_gram_pure <= 0.047 mse = 0.0 samples = 55 value = 0.041 120->121 128 mse = 0.0 samples = 19 value = 0.043 120->128 122 cost_per_gram_pure <= 0.046 mse = 0.0 samples = 43 value = 0.041 121->122 127 mse = 0.0 samples = 12 value = 0.042 121->127 123 quality <= 91.5 mse = 0.0 samples = 30 value = 0.041 122->123 126 mse = 0.0 samples = 13 value = 0.041 122->126 124 mse = 0.0 samples = 19 value = 0.041 123->124 125 mse = 0.0 samples = 11 value = 0.041 123->125 130 cost_per_gram_pure <= 0.055 mse = 0.0 samples = 70 value = 0.046 129->130 137 mse = 0.0 samples = 28 value = 0.056 129->137 131 quality <= 87.5 mse = 0.0 samples = 56 value = 0.045 130->131 136 mse = 0.0 samples = 14 value = 0.05 130->136 132 mse = 0.0 samples = 17 value = 0.043 131->132 133 cost_per_gram_pure <= 0.05 mse = 0.0 samples = 39 value = 0.046 131->133 134 mse = 0.0 samples = 21 value = 0.046 133->134 135 mse = 0.0 samples = 18 value = 0.047 133->135 139 mse = 0.0 samples = 25 value = 0.086 138->139 140 mse = 0.0 samples = 20 value = 0.122 138->140
In [3258]:
#datoscoke[['cost_per_gram','R_simple','R_mult','R_SinV','Arbol','predicssvd']].head(15)
In [3265]:
from sklearn.metrics import accuracy_score

#score1 = accuracy_score ()

Clustering

Matriz de distancias

  • Matriz dependiente de una variable
In [3267]:
matrizSimple =  datoscoke.loc[:, ["vendor_name", "rating"]].copy()
vars =matrizSimple.columns.values.tolist()[1:4]
vars
Out[3267]:
['rating']
In [3268]:
matrizSimple.head(5)
Out[3268]:
vendor_name rating
0 Mister-Molly 4.63
1 Mister-Molly 4.63
2 0ldamsterdamm 4.94
3 lhomme-masquer 5.00
4 SMOOTHCRIMINAL007 4.78
In [3271]:
matrizSimplecompleta=dm_to_df2 (ddsimple, matrizSimple["vendor_name"])
matrizSimplecompleta
Out[3271]:
vendor_name Mister-Molly Mister-Molly 0ldamsterdamm lhomme-masquer SMOOTHCRIMINAL007 SMOOTHCRIMINAL007 SMOOTHCRIMINAL007 SMOOTHCRIMINAL007 SMOOTHCRIMINAL007 cocaineuk ... gomorraamsterdam gomorraamsterdam gomorraamsterdam gomorraamsterdam gomorraamsterdam gomorraamsterdam gomorraamsterdam gomorraamsterdam gomorraamsterdam gomorraamsterdam
vendor_name
Mister-Molly 0.00 0.00 0.31 0.37 0.15 0.15 0.15 0.15 0.15 0.30 ... 0.23 0.23 0.23 0.23 0.23 0.23 0.23 0.23 0.23 0.23
Mister-Molly 0.00 0.00 0.31 0.37 0.15 0.15 0.15 0.15 0.15 0.30 ... 0.23 0.23 0.23 0.23 0.23 0.23 0.23 0.23 0.23 0.23
0ldamsterdamm 0.31 0.31 0.00 0.06 0.16 0.16 0.16 0.16 0.16 0.01 ... 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08
lhomme-masquer 0.37 0.37 0.06 0.00 0.22 0.22 0.22 0.22 0.22 0.07 ... 0.14 0.14 0.14 0.14 0.14 0.14 0.14 0.14 0.14 0.14
SMOOTHCRIMINAL007 0.15 0.15 0.16 0.22 0.00 0.00 0.00 0.00 0.00 0.15 ... 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08
SMOOTHCRIMINAL007 0.15 0.15 0.16 0.22 0.00 0.00 0.00 0.00 0.00 0.15 ... 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08
SMOOTHCRIMINAL007 0.15 0.15 0.16 0.22 0.00 0.00 0.00 0.00 0.00 0.15 ... 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08
SMOOTHCRIMINAL007 0.15 0.15 0.16 0.22 0.00 0.00 0.00 0.00 0.00 0.15 ... 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08
SMOOTHCRIMINAL007 0.15 0.15 0.16 0.22 0.00 0.00 0.00 0.00 0.00 0.15 ... 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08
cocaineuk 0.30 0.30 0.01 0.07 0.15 0.15 0.15 0.15 0.15 0.00 ... 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07
cocaineuk 0.30 0.30 0.01 0.07 0.15 0.15 0.15 0.15 0.15 0.00 ... 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07
cocaineuk 0.30 0.30 0.01 0.07 0.15 0.15 0.15 0.15 0.15 0.00 ... 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07
cocaineuk 0.30 0.30 0.01 0.07 0.15 0.15 0.15 0.15 0.15 0.00 ... 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07
cocaineuk 0.30 0.30 0.01 0.07 0.15 0.15 0.15 0.15 0.15 0.00 ... 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07
cocaineuk 0.30 0.30 0.01 0.07 0.15 0.15 0.15 0.15 0.15 0.00 ... 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07
cocaineuk 0.30 0.30 0.01 0.07 0.15 0.15 0.15 0.15 0.15 0.00 ... 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07
cocaineuk 0.30 0.30 0.01 0.07 0.15 0.15 0.15 0.15 0.15 0.00 ... 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07
cocaineuk 0.30 0.30 0.01 0.07 0.15 0.15 0.15 0.15 0.15 0.00 ... 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07
cocaineuk 0.30 0.30 0.01 0.07 0.15 0.15 0.15 0.15 0.15 0.00 ... 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07
cocaineuk 0.30 0.30 0.01 0.07 0.15 0.15 0.15 0.15 0.15 0.00 ... 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07
cocaineuk 0.30 0.30 0.01 0.07 0.15 0.15 0.15 0.15 0.15 0.00 ... 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07
cocaineuk 0.30 0.30 0.01 0.07 0.15 0.15 0.15 0.15 0.15 0.00 ... 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07
cocaineuk 0.30 0.30 0.01 0.07 0.15 0.15 0.15 0.15 0.15 0.00 ... 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07
Barrio 0.37 0.37 0.06 0.00 0.22 0.22 0.22 0.22 0.22 0.07 ... 0.14 0.14 0.14 0.14 0.14 0.14 0.14 0.14 0.14 0.14
Barrio 0.37 0.37 0.06 0.00 0.22 0.22 0.22 0.22 0.22 0.07 ... 0.14 0.14 0.14 0.14 0.14 0.14 0.14 0.14 0.14 0.14
Barrio 0.37 0.37 0.06 0.00 0.22 0.22 0.22 0.22 0.22 0.07 ... 0.14 0.14 0.14 0.14 0.14 0.14 0.14 0.14 0.14 0.14
Barrio 0.37 0.37 0.06 0.00 0.22 0.22 0.22 0.22 0.22 0.07 ... 0.14 0.14 0.14 0.14 0.14 0.14 0.14 0.14 0.14 0.14
Barrio 0.37 0.37 0.06 0.00 0.22 0.22 0.22 0.22 0.22 0.07 ... 0.14 0.14 0.14 0.14 0.14 0.14 0.14 0.14 0.14 0.14
LONDON-UNDERGROUND 0.24 0.24 0.07 0.13 0.09 0.09 0.09 0.09 0.09 0.06 ... 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01
LONDON-UNDERGROUND 0.24 0.24 0.07 0.13 0.09 0.09 0.09 0.09 0.09 0.06 ... 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
FastGermanDealer 0.34 0.34 0.03 0.03 0.19 0.19 0.19 0.19 0.19 0.04 ... 0.11 0.11 0.11 0.11 0.11 0.11 0.11 0.11 0.11 0.11
FastGermanDealer 0.34 0.34 0.03 0.03 0.19 0.19 0.19 0.19 0.19 0.04 ... 0.11 0.11 0.11 0.11 0.11 0.11 0.11 0.11 0.11 0.11
FastGermanDealer 0.34 0.34 0.03 0.03 0.19 0.19 0.19 0.19 0.19 0.04 ... 0.11 0.11 0.11 0.11 0.11 0.11 0.11 0.11 0.11 0.11
FastGermanDealer 0.34 0.34 0.03 0.03 0.19 0.19 0.19 0.19 0.19 0.04 ... 0.11 0.11 0.11 0.11 0.11 0.11 0.11 0.11 0.11 0.11
FlyEmiratess 0.05 0.05 0.36 0.42 0.20 0.20 0.20 0.20 0.20 0.35 ... 0.28 0.28 0.28 0.28 0.28 0.28 0.28 0.28 0.28 0.28
FlyEmiratess 0.05 0.05 0.36 0.42 0.20 0.20 0.20 0.20 0.20 0.35 ... 0.28 0.28 0.28 0.28 0.28 0.28 0.28 0.28 0.28 0.28
DarknetFR 0.37 0.37 0.06 0.00 0.22 0.22 0.22 0.22 0.22 0.07 ... 0.14 0.14 0.14 0.14 0.14 0.14 0.14 0.14 0.14 0.14
FlyEmiratess 0.05 0.05 0.36 0.42 0.20 0.20 0.20 0.20 0.20 0.35 ... 0.28 0.28 0.28 0.28 0.28 0.28 0.28 0.28 0.28 0.28
FlyEmiratess 0.05 0.05 0.36 0.42 0.20 0.20 0.20 0.20 0.20 0.35 ... 0.28 0.28 0.28 0.28 0.28 0.28 0.28 0.28 0.28 0.28
DarknetFR 0.37 0.37 0.06 0.00 0.22 0.22 0.22 0.22 0.22 0.07 ... 0.14 0.14 0.14 0.14 0.14 0.14 0.14 0.14 0.14 0.14
DarknetFR 0.37 0.37 0.06 0.00 0.22 0.22 0.22 0.22 0.22 0.07 ... 0.14 0.14 0.14 0.14 0.14 0.14 0.14 0.14 0.14 0.14
FlyEmiratess 0.05 0.05 0.36 0.42 0.20 0.20 0.20 0.20 0.20 0.35 ... 0.28 0.28 0.28 0.28 0.28 0.28 0.28 0.28 0.28 0.28
FlyEmiratess 0.05 0.05 0.36 0.42 0.20 0.20 0.20 0.20 0.20 0.35 ... 0.28 0.28 0.28 0.28 0.28 0.28 0.28 0.28 0.28 0.28
DarknetFR 0.37 0.37 0.06 0.00 0.22 0.22 0.22 0.22 0.22 0.07 ... 0.14 0.14 0.14 0.14 0.14 0.14 0.14 0.14 0.14 0.14
DarknetFR 0.37 0.37 0.06 0.00 0.22 0.22 0.22 0.22 0.22 0.07 ... 0.14 0.14 0.14 0.14 0.14 0.14 0.14 0.14 0.14 0.14
FlyEmiratess 0.05 0.05 0.36 0.42 0.20 0.20 0.20 0.20 0.20 0.35 ... 0.28 0.28 0.28 0.28 0.28 0.28 0.28 0.28 0.28 0.28
FlyEmiratess 0.05 0.05 0.36 0.42 0.20 0.20 0.20 0.20 0.20 0.35 ... 0.28 0.28 0.28 0.28 0.28 0.28 0.28 0.28 0.28 0.28
DarknetFR 0.37 0.37 0.06 0.00 0.22 0.22 0.22 0.22 0.22 0.07 ... 0.14 0.14 0.14 0.14 0.14 0.14 0.14 0.14 0.14 0.14
gomorraamsterdam 0.23 0.23 0.08 0.14 0.08 0.08 0.08 0.08 0.08 0.07 ... 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
gomorraamsterdam 0.23 0.23 0.08 0.14 0.08 0.08 0.08 0.08 0.08 0.07 ... 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
gomorraamsterdam 0.23 0.23 0.08 0.14 0.08 0.08 0.08 0.08 0.08 0.07 ... 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
gomorraamsterdam 0.23 0.23 0.08 0.14 0.08 0.08 0.08 0.08 0.08 0.07 ... 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
gomorraamsterdam 0.23 0.23 0.08 0.14 0.08 0.08 0.08 0.08 0.08 0.07 ... 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
gomorraamsterdam 0.23 0.23 0.08 0.14 0.08 0.08 0.08 0.08 0.08 0.07 ... 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
gomorraamsterdam 0.23 0.23 0.08 0.14 0.08 0.08 0.08 0.08 0.08 0.07 ... 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
gomorraamsterdam 0.23 0.23 0.08 0.14 0.08 0.08 0.08 0.08 0.08 0.07 ... 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
gomorraamsterdam 0.23 0.23 0.08 0.14 0.08 0.08 0.08 0.08 0.08 0.07 ... 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
gomorraamsterdam 0.23 0.23 0.08 0.14 0.08 0.08 0.08 0.08 0.08 0.07 ... 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
gomorraamsterdam 0.23 0.23 0.08 0.14 0.08 0.08 0.08 0.08 0.08 0.07 ... 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
gomorraamsterdam 0.23 0.23 0.08 0.14 0.08 0.08 0.08 0.08 0.08 0.07 ... 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

1504 rows × 1504 columns

In [3272]:
matriz = datoscoke.loc[:, ["vendor_name","quality", "rating", "successful_transactions"]].copy()
vars=matriz.columns.values.tolist()[1:4]
vars
Out[3272]:
['quality', 'rating', 'successful_transactions']
In [3273]:
matriz.head()
Out[3273]:
vendor_name quality rating successful_transactions
0 Mister-Molly 90.0 4.63 90
1 Mister-Molly 90.0 4.63 90
2 0ldamsterdamm 89.0 4.94 620
3 lhomme-masquer 89.0 5.00 15
4 SMOOTHCRIMINAL007 87.0 4.78 28

Matriz de distancias entre vendedores con multiples variables

In [3276]:
matrizcompleta=dm_to_df (dd1, matriz["vendor_name"])
matrizcompleta
Out[3276]:
vendor_name Mister-Molly Mister-Molly 0ldamsterdamm lhomme-masquer SMOOTHCRIMINAL007 SMOOTHCRIMINAL007 SMOOTHCRIMINAL007 SMOOTHCRIMINAL007 SMOOTHCRIMINAL007 cocaineuk ... gomorraamsterdam gomorraamsterdam gomorraamsterdam gomorraamsterdam gomorraamsterdam gomorraamsterdam gomorraamsterdam gomorraamsterdam gomorraamsterdam gomorraamsterdam
vendor_name
Mister-Molly 0.00 0.00 531.31 76.37 65.15 65.15 65.15 65.15 65.15 305.30 ... 272.23 272.23 272.23 272.23 272.23 272.23 272.23 272.23 272.23 272.23
Mister-Molly 0.00 0.00 531.31 76.37 65.15 65.15 65.15 65.15 65.15 305.30 ... 272.23 272.23 272.23 272.23 272.23 272.23 272.23 272.23 272.23 272.23
0ldamsterdamm 531.31 531.31 0.00 605.06 594.16 594.16 594.16 594.16 594.16 254.01 ... 263.08 263.08 263.08 263.08 263.08 263.08 263.08 263.08 263.08 263.08
lhomme-masquer 76.37 76.37 605.06 0.00 15.22 15.22 15.22 15.22 15.22 379.07 ... 348.14 348.14 348.14 348.14 348.14 348.14 348.14 348.14 348.14 348.14
SMOOTHCRIMINAL007 65.15 65.15 594.16 15.22 0.00 0.00 0.00 0.00 0.00 364.15 ... 337.08 337.08 337.08 337.08 337.08 337.08 337.08 337.08 337.08 337.08
SMOOTHCRIMINAL007 65.15 65.15 594.16 15.22 0.00 0.00 0.00 0.00 0.00 364.15 ... 337.08 337.08 337.08 337.08 337.08 337.08 337.08 337.08 337.08 337.08
SMOOTHCRIMINAL007 65.15 65.15 594.16 15.22 0.00 0.00 0.00 0.00 0.00 364.15 ... 337.08 337.08 337.08 337.08 337.08 337.08 337.08 337.08 337.08 337.08
SMOOTHCRIMINAL007 65.15 65.15 594.16 15.22 0.00 0.00 0.00 0.00 0.00 364.15 ... 337.08 337.08 337.08 337.08 337.08 337.08 337.08 337.08 337.08 337.08
SMOOTHCRIMINAL007 65.15 65.15 594.16 15.22 0.00 0.00 0.00 0.00 0.00 364.15 ... 337.08 337.08 337.08 337.08 337.08 337.08 337.08 337.08 337.08 337.08
cocaineuk 305.30 305.30 254.01 379.07 364.15 364.15 364.15 364.15 364.15 0.00 ... 37.07 37.07 37.07 37.07 37.07 37.07 37.07 37.07 37.07 37.07
cocaineuk 305.30 305.30 254.01 379.07 364.15 364.15 364.15 364.15 364.15 0.00 ... 37.07 37.07 37.07 37.07 37.07 37.07 37.07 37.07 37.07 37.07
cocaineuk 305.30 305.30 254.01 379.07 364.15 364.15 364.15 364.15 364.15 0.00 ... 37.07 37.07 37.07 37.07 37.07 37.07 37.07 37.07 37.07 37.07
cocaineuk 295.30 295.30 246.01 371.07 360.15 360.15 360.15 360.15 360.15 20.00 ... 23.07 23.07 23.07 23.07 23.07 23.07 23.07 23.07 23.07 23.07
cocaineuk 305.30 305.30 254.01 379.07 364.15 364.15 364.15 364.15 364.15 0.00 ... 37.07 37.07 37.07 37.07 37.07 37.07 37.07 37.07 37.07 37.07
cocaineuk 305.30 305.30 254.01 379.07 364.15 364.15 364.15 364.15 364.15 0.00 ... 37.07 37.07 37.07 37.07 37.07 37.07 37.07 37.07 37.07 37.07
cocaineuk 295.30 295.30 246.01 371.07 360.15 360.15 360.15 360.15 360.15 20.00 ... 23.07 23.07 23.07 23.07 23.07 23.07 23.07 23.07 23.07 23.07
cocaineuk 305.30 305.30 254.01 379.07 364.15 364.15 364.15 364.15 364.15 0.00 ... 37.07 37.07 37.07 37.07 37.07 37.07 37.07 37.07 37.07 37.07
cocaineuk 295.30 295.30 246.01 371.07 360.15 360.15 360.15 360.15 360.15 20.00 ... 23.07 23.07 23.07 23.07 23.07 23.07 23.07 23.07 23.07 23.07
cocaineuk 305.30 305.30 254.01 379.07 364.15 364.15 364.15 364.15 364.15 0.00 ... 37.07 37.07 37.07 37.07 37.07 37.07 37.07 37.07 37.07 37.07
cocaineuk 305.30 305.30 254.01 379.07 364.15 364.15 364.15 364.15 364.15 0.00 ... 37.07 37.07 37.07 37.07 37.07 37.07 37.07 37.07 37.07 37.07
cocaineuk 305.30 305.30 254.01 379.07 364.15 364.15 364.15 364.15 364.15 0.00 ... 37.07 37.07 37.07 37.07 37.07 37.07 37.07 37.07 37.07 37.07
cocaineuk 295.30 295.30 246.01 371.07 360.15 360.15 360.15 360.15 360.15 20.00 ... 23.07 23.07 23.07 23.07 23.07 23.07 23.07 23.07 23.07 23.07
cocaineuk 295.30 295.30 246.01 371.07 360.15 360.15 360.15 360.15 360.15 20.00 ... 23.07 23.07 23.07 23.07 23.07 23.07 23.07 23.07 23.07 23.07
Barrio 10.37 10.37 521.06 86.00 75.22 75.22 75.22 75.22 75.22 295.07 ... 262.14 262.14 262.14 262.14 262.14 262.14 262.14 262.14 262.14 262.14
Barrio 10.37 10.37 521.06 86.00 75.22 75.22 75.22 75.22 75.22 295.07 ... 262.14 262.14 262.14 262.14 262.14 262.14 262.14 262.14 262.14 262.14
Barrio 10.37 10.37 521.06 86.00 75.22 75.22 75.22 75.22 75.22 295.07 ... 262.14 262.14 262.14 262.14 262.14 262.14 262.14 262.14 262.14 262.14
Barrio 10.37 10.37 521.06 86.00 75.22 75.22 75.22 75.22 75.22 295.07 ... 262.14 262.14 262.14 262.14 262.14 262.14 262.14 262.14 262.14 262.14
Barrio 10.37 10.37 521.06 86.00 75.22 75.22 75.22 75.22 75.22 295.07 ... 262.14 262.14 262.14 262.14 262.14 262.14 262.14 262.14 262.14 262.14
LONDON-UNDERGROUND 90.24 90.24 519.07 164.13 149.09 149.09 149.09 149.09 149.09 265.06 ... 262.01 262.01 262.01 262.01 262.01 262.01 262.01 262.01 262.01 262.01
LONDON-UNDERGROUND 90.24 90.24 519.07 164.13 149.09 149.09 149.09 149.09 149.09 265.06 ... 262.01 262.01 262.01 262.01 262.01 262.01 262.01 262.01 262.01 262.01
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
FastGermanDealer 610.34 610.34 81.03 686.03 675.19 675.19 675.19 675.19 675.19 335.04 ... 342.11 342.11 342.11 342.11 342.11 342.11 342.11 342.11 342.11 342.11
FastGermanDealer 610.34 610.34 81.03 686.03 675.19 675.19 675.19 675.19 675.19 335.04 ... 342.11 342.11 342.11 342.11 342.11 342.11 342.11 342.11 342.11 342.11
FastGermanDealer 610.34 610.34 81.03 686.03 675.19 675.19 675.19 675.19 675.19 335.04 ... 342.11 342.11 342.11 342.11 342.11 342.11 342.11 342.11 342.11 342.11
FastGermanDealer 610.34 610.34 81.03 686.03 675.19 675.19 675.19 675.19 675.19 335.04 ... 342.11 342.11 342.11 342.11 342.11 342.11 342.11 342.11 342.11 342.11
FlyEmiratess 98.05 98.05 627.36 22.42 33.20 33.20 33.20 33.20 33.20 373.35 ... 370.28 370.28 370.28 370.28 370.28 370.28 370.28 370.28 370.28 370.28
FlyEmiratess 80.05 80.05 611.36 6.42 21.20 21.20 21.20 21.20 21.20 385.35 ... 348.28 348.28 348.28 348.28 348.28 348.28 348.28 348.28 348.28 348.28
DarknetFR 90.37 90.37 619.06 14.00 29.22 29.22 29.22 29.22 29.22 393.07 ... 362.14 362.14 362.14 362.14 362.14 362.14 362.14 362.14 362.14 362.14
FlyEmiratess 98.05 98.05 627.36 22.42 33.20 33.20 33.20 33.20 33.20 373.35 ... 370.28 370.28 370.28 370.28 370.28 370.28 370.28 370.28 370.28 370.28
FlyEmiratess 80.05 80.05 611.36 6.42 21.20 21.20 21.20 21.20 21.20 385.35 ... 348.28 348.28 348.28 348.28 348.28 348.28 348.28 348.28 348.28 348.28
DarknetFR 90.37 90.37 619.06 14.00 29.22 29.22 29.22 29.22 29.22 393.07 ... 362.14 362.14 362.14 362.14 362.14 362.14 362.14 362.14 362.14 362.14
DarknetFR 90.37 90.37 619.06 14.00 29.22 29.22 29.22 29.22 29.22 393.07 ... 362.14 362.14 362.14 362.14 362.14 362.14 362.14 362.14 362.14 362.14
FlyEmiratess 98.05 98.05 627.36 22.42 33.20 33.20 33.20 33.20 33.20 373.35 ... 370.28 370.28 370.28 370.28 370.28 370.28 370.28 370.28 370.28 370.28
FlyEmiratess 80.05 80.05 611.36 6.42 21.20 21.20 21.20 21.20 21.20 385.35 ... 348.28 348.28 348.28 348.28 348.28 348.28 348.28 348.28 348.28 348.28
DarknetFR 90.37 90.37 619.06 14.00 29.22 29.22 29.22 29.22 29.22 393.07 ... 362.14 362.14 362.14 362.14 362.14 362.14 362.14 362.14 362.14 362.14
DarknetFR 90.37 90.37 619.06 14.00 29.22 29.22 29.22 29.22 29.22 393.07 ... 362.14 362.14 362.14 362.14 362.14 362.14 362.14 362.14 362.14 362.14
FlyEmiratess 98.05 98.05 627.36 22.42 33.20 33.20 33.20 33.20 33.20 373.35 ... 370.28 370.28 370.28 370.28 370.28 370.28 370.28 370.28 370.28 370.28
FlyEmiratess 80.05 80.05 611.36 6.42 21.20 21.20 21.20 21.20 21.20 385.35 ... 348.28 348.28 348.28 348.28 348.28 348.28 348.28 348.28 348.28 348.28
DarknetFR 90.37 90.37 619.06 14.00 29.22 29.22 29.22 29.22 29.22 393.07 ... 362.14 362.14 362.14 362.14 362.14 362.14 362.14 362.14 362.14 362.14
gomorraamsterdam 272.23 272.23 263.08 348.14 337.08 337.08 337.08 337.08 337.08 37.07 ... 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
gomorraamsterdam 272.23 272.23 263.08 348.14 337.08 337.08 337.08 337.08 337.08 37.07 ... 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
gomorraamsterdam 272.23 272.23 263.08 348.14 337.08 337.08 337.08 337.08 337.08 37.07 ... 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
gomorraamsterdam 272.23 272.23 263.08 348.14 337.08 337.08 337.08 337.08 337.08 37.07 ... 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
gomorraamsterdam 272.23 272.23 263.08 348.14 337.08 337.08 337.08 337.08 337.08 37.07 ... 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
gomorraamsterdam 272.23 272.23 263.08 348.14 337.08 337.08 337.08 337.08 337.08 37.07 ... 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
gomorraamsterdam 272.23 272.23 263.08 348.14 337.08 337.08 337.08 337.08 337.08 37.07 ... 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
gomorraamsterdam 272.23 272.23 263.08 348.14 337.08 337.08 337.08 337.08 337.08 37.07 ... 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
gomorraamsterdam 272.23 272.23 263.08 348.14 337.08 337.08 337.08 337.08 337.08 37.07 ... 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
gomorraamsterdam 272.23 272.23 263.08 348.14 337.08 337.08 337.08 337.08 337.08 37.07 ... 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
gomorraamsterdam 272.23 272.23 263.08 348.14 337.08 337.08 337.08 337.08 337.08 37.07 ... 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
gomorraamsterdam 272.23 272.23 263.08 348.14 337.08 337.08 337.08 337.08 337.08 37.07 ... 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

1504 rows × 1504 columns

In [3277]:
molly=matriz.loc[matriz['vendor_name'] == "Mister-Molly"]
molly.head()
Out[3277]:
vendor_name quality rating successful_transactions
0 Mister-Molly 90.0 4.63 90
1 Mister-Molly 90.0 4.63 90
261 Mister-Molly 90.0 4.63 90
305 Mister-Molly 90.0 4.63 90
346 Mister-Molly 90.0 4.63 90
In [3278]:
matriz.loc[matriz['vendor_name'] == "Mister-Molly"].mean()
Out[3278]:
quality                    90.00
rating                      4.63
successful_transactions    90.00
dtype: float64
In [3279]:
gof=matriz.loc[matriz['vendor_name'] == "0ldamsterdamm"]
gof.head()
Out[3279]:
vendor_name quality rating successful_transactions
2 0ldamsterdamm 89.0 4.94 620
253 0ldamsterdamm 89.0 4.94 620
288 0ldamsterdamm 89.0 4.94 620
409 0ldamsterdamm 89.0 4.94 620
414 0ldamsterdamm 89.0 4.94 620
In [3280]:
matriz.loc[matriz['vendor_name'] == "0ldamsterdamm"].mean()
Out[3280]:
quality                     89.00
rating                       4.94
successful_transactions    620.00
dtype: float64
In [3281]:
plt.figure(figsize=(15,10))

sns.scatterplot(x=datoscoke['quality'], y=datoscoke['successful_transactions'],hue =datoscoke['vendor_name']=="Gofastteam") 
Out[3281]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f32b267160>
In [3282]:
plt.figure(figsize=(15,10))

sns.scatterplot(x=datoscoke['quality'], y=datoscoke['cost_per_gram'],hue =datoscoke['vendor_name']=="Gofastteam")
Out[3282]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f32b2db9e8>
In [3283]:
plt.figure(figsize=(15,10))

sns.scatterplot(x=datoscoke['rating'], y=datoscoke['successful_transactions'],hue =datoscoke['vendor_name']=="mordekai") 
Out[3283]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f32d481f28>
In [3284]:
plt.figure(figsize=(15,10))

sns.scatterplot(x=datoscoke['quality'], y=datoscoke['cost_per_gram'],hue =datoscoke['vendor_name']=="mordekai") 
Out[3284]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f321a88668>

K-MEANS

In [3306]:
km_norm.head(10)
Out[3306]:
quality successful_transactions clust_k
0 0.80 0.027393 1
1 0.80 0.027393 1
2 0.78 0.190520 1
3 0.78 0.004309 1
4 0.74 0.008310 0
5 0.74 0.008310 0
6 0.74 0.008310 0
7 0.74 0.008310 0
8 0.74 0.008310 0
9 0.50 0.116651 3
In [3307]:
Image (filename="img/codo.png", width=800, height=800)
Out[3307]:
In [3308]:
plt.hist(md_k)
Out[3308]:
(array([ 280.,    0.,    0., 1145.,    0.,    0.,   10.,    0.,    0.,
          69.]),
 array([0. , 0.3, 0.6, 0.9, 1.2, 1.5, 1.8, 2.1, 2.4, 2.7, 3. ]),
 <a list of 10 Patch objects>)
In [3354]:
Image (filename="img/quality.png", width=1000, height=1000)
Out[3354]:
In [3315]:
kmcodo=datoscoke.loc[:, [ "quality","successful_transactions"]].copy()
kmcodo.head()
Out[3315]:
quality successful_transactions
0 90.0 90
1 90.0 90
2 89.0 620
3 89.0 15
4 87.0 28
In [3321]:
plt.figure(figsize=(10,5))
plt.plot(K, distortions, 'bx-') 
plt.xlabel('Valores de K') 
plt.ylabel('Dist') 
plt.title('Número de clusters') 
plt.show() 

k-MEANS (Rating - successful_transactions)

In [3342]:
Image (filename="img/rs.png", width=1000, height=1000)
Out[3342]:

Usuarios

In [3345]:
Image (filename="img/pregunta.jpg", width=800, height=800)
Out[3345]:
In [3346]:
Image (filename="img/vendedores.jpg", width=800, height=800)
Out[3346]:
In [3349]:
Image (filename="img/compras.jpg", width=800, height=800)
Out[3349]:
In [3351]:
Image (filename="img/trust.png", width=800, height=800)
Out[3351]:
In [3352]:
Image (filename="img/scam.png", width=800, height=800)
Out[3352]:
In [3353]:
Image (filename="img/goff.jpg", width=800, height=800)
Out[3353]:
In [ ]: